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ABSTRACT 

We present the cross-identification and source photometry techniques used to process 
Herschel SPIRE imaging taken as part of the Herschel Multi-Tiered Extragalactic 
Survey (HerMES). Cross-identifications are performed in map-space so as to minimise 
source blending effects. We make use of a combination of linear inversion and model 
selection techniques to produce reliable cross-identification catalogues based on Spitzer 
MIPS 24 /mi source positions. Testing on simulations and real Herschel observations 
show that this approach gives robust results for even the faintest sources (5250 ~ 10 
mJy). We apply our new technique to HerMES SPIRE observations taken as part of 
the science demostration phase of Herschel. For our real SPIRE observations we show 
that, for bright unconfused sources, our flux density estimates are in good agreement 
with those produced via more traditional point source detection methods (SussEx- 
tractor; Savage & Oliver et al. 2006) by Smith et al 2010. When compared to the 
measured number density of sources in the SPIRE bands, we show that our method 
allows the recovery of a larger fraction of faint sources than these traditional methods. 
However this completeness is heavily dependant on the relative depth of the existing 
24 /im catalogues and SPIRE imaging. Using our deepest multi- wavelength dataset in 
GOODS-N, we estimate that the use of shallow 24 /urn in our other fields introduces 
an incompleteness at faint levels of between 20-40 per cent at 250 /xm. 

Key words: 



1 INTRODUCTION 

Measuring accurate flux densities for sources in astronomical 
images dominated by confusion noise is the greatest obstacle 
to scientific analysis of data from next generation telescopes 
at far-IR to radio wavelengths. Great advances in the sen- 
sitivity of instruments at these long wavelengths has meant 
that the blended signal from numerous, unresolved, faint 
sources now form a non-negligible fraction of the observed 
telescope background. Hence confusion noise, i.e. fluctua- 
tions in this background, is now the dominant source of noise 
in deep imaging. 

This results in several complications in the analysis 
of low resolution, long wavelength, imaging. Firstly, con- 
fusion acts to increase the positional uncertainty of sources 
dramatically (e.g. Hogg 2001), making cross- identifications 
with other wavelengths problematic. Secondly, correlations 
between the confusing background and sources above the 
confusion limit result in, at best, flux boosting of detected 
sources above the confusion limit and, at worst, complex 
blends of correlated confusion noise, resulting in spurious 
sources (Scheuer & Ryle 1957, Condon 1974). 

In recent history there have been two distinct ap- 
proaches to dealing with these issues. Fairly traditional 
source detection methods, combined with probabilistic ap- 
proaches for flux boosting and source identification have 
been used to good effect on sub- mm surveys performed with 
SCUBA (i.e. Lilly et al. 1999, Mortier et al. 2005, Pope et 
al. 2005., Ivison et al. 2007). 

By comparison others have opted for a more statistical 
approach, choosing to ignore individual sources and look at 
the aggregate properties of sources via either stacking (Dole 
et al. 2006, Pascale et al. 2009, Marsden et al. 2010) or the 
map statistics themselves via the pixel intensity distribution, 
the so-called P{D) (e.g. Patanchon et al. 2009). 
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Both approaches have advantages and disadvantages; 
working with individual sources allows the true variation 
of sub-mm galaxy properties and their correlations with 
other observables to be properly investigated. However, find- 
ing multi-wavelength identifications for individual sub-mm 
sources is usually difficult, and generally reliable identifica- 
tions can be found for only a fraction of sources (Ivison et al. 
2007; Roseboom et al. 2009). Statistical approaches have the 
advantage of using all the available data, and hence provide 
greater precision in the parameters of interest. However, in- 
terpretation of these statistically-derived quantities is some- 
times complicated, and highly dependent on the choice of 
parameterisation. 

Recently several authors have made use of an approach 
which arguably takes the best elements of the three tech- 
niques discussed above. By using a linear inversion tech- 
nique to fit for the flux density of all known sources simul- 
taneously, the ability to work on individual sources is re- 
tained, while the information in the map itself can be used 
to distinguish the contributions from each source. This ap- 
proach has been used by Scott et al. (2002) to fit the flux 
densities of SCUBA 850 fim sources in the 8 mJy survey; 
Magnelli et al. (2009) to fit the Spitzer 24 (im flux den- 
sity of IRAC detected sources in GOODS-N; and also by 
Bethermin et al. (2010) and Chapin et al. (2010) to fit the 
BLAST and BLAST/LABOCA data, respectively, for 24 /zm 
detected sources in the extended Chandra Deep Field South 
(eCDFS). The key to this approach is its simplicity; the only 
assumptions are that all sources are unresolved by the tele- 
scope, and that the positions of all sources are known. If 
these assumptions hold, then in the limit of infinite signal- 
to-noise ratio in the image the resulting flux density mea- 
surements would be perfect, irrespective of source density. 

Here we present a similar technique developed to fit 
the SPIRE band flux densities of 24 fim sources in fields ob- 
served as part of the Science Demonstration Phase (SDP) of 
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the Hersche^ (Pilbratt et al. 2010) mission by the HerMES. 
The SPIRE instrument, its in-orbit performance, and its 
scientific capabilities are described by Griffin et al. (2010), 
and the SPIRE astronomical calibration methods and ac- 
curacy are outlined in Swinyard et al. (2010). Our tech- 
nique is distinct from those discussed above as we include 
an additional model-selection stage to ensure that only input 
sources which are justified by the SPIRE data are retained. 
This stage helps to alleviate the problem of overfitting, i.e. 
fitting more sources than there are independant data points 
to constrain. 

Section [5] describes the datasets used in this work, Sec- 
tion [3TT] presents the linear model used to describe the map. 
Section [4] discusses how model selection can be used to 
"tune" the input list of positions, while Sections [5] and [7J 
present and discuss the results obtained by implementing 
this technique on both simulated and observed Herschel 
datasets. 



2 DATA 

In this paper we make use of Herschel data from HerMES 
taken as part of the SDP of the Herschel mission. HerMES 
performed observations of 5 fields during SDP; these obser- 
vations are described in Oliver et al. (2010a, 2010b) and 
summarised in Table [1] 

SPIRE data are processed using the Herschel Interac- 
tive Processing Environment (HIPE). Details of the SPIRE 
data processing are described in Smith et al. (2010, in prep.), 
however we briefly sumarise the main points here. SPIRE 
maps used in this paper make use of the naive map-making 
algorithm, with no Wiener filtering applied. While the abso- 
lute astrometry of SPIRE imaging is accurate to ~ 2 arcsec 
we apply global corrections to the astrometry of the pro- 
cessed maps, based on stacking at the positions of known 
radio sources. After these corrections have been applied we 
expect our maps to have an overall astrometric accuracy of 
< 0.5 arcsec. 

In addition source catalogues are produced using the 
Sussextractor algorithm in HIPE (Savage & Oliver 2006). 
Although we do not make use of these catalogues in the 
cross identification process, comparisons to them are made 
in Section [7] Throughout we refer to the HIPE processed 
data products by the moniker SCAT (SPIRE Catalogue) of 
which we use the latest v3 internal release. 

Cross identifications are made between these data, and 
archival Spitzer IRAC and MIPS datasets. In the wide, shal- 
low fields, Lockman SWIRE (LH-SWIRE) and FLS, we 
make use of the multi-wavelength catalogues described in 
Vaccari et al. (2010, in prep). In Lockman, these catalogues 
use the Spitzer SWIRE (Lonsdale et al. 2003) dataset as a 
starting point, specifically those sources detected by IRAC 
(at 3.6 /im and/or 4.5 /im. Analogous catalogues are con- 
structed in FLS, using the IRAC source catalogues of Lacy 
et al. (2005) and MIPS source catalogues of Fadda et al. 
(2006). 

1 Herschel is an ESA space observatory with science instruments 
provided by Principal Investigator consortia. It is open for pro- 
posals for observing time from the worldwide astronomical com- 
munity. 



In the deeper fields, archival ancillary data are provided 
by several previous projects. In GOODS-N we make use of 
the Spitzer IRAC and MIPS observations taken as part of 
the GOODS program, specifically the catalogue described 
in Magnelli et al. (2009), which measures the 24 /tm flux 
density using the position of IRAC sources as a prior. 

In the Lockman North (LH-North) region we use repro- 
cessed archival 24 /im data from the GO program of Owen 
et al. 



3 LINEAR FITTING METHOD 
3.1 Basic Equations 

Our data d is an image of dimensions ni x ni = M pixels. 
The pixels are located at discrete positions (x, y). Our model 
assumes this data to be formed by a number of point sources 
with known image coordinates, (u,v), and with unknown 
flux density, f . If each source i makes a contribution to the 
data given by the point response function (PRF) P(x — 
Ui,y — Vi) we can describe the flux density in a given pixel 
j as 

dj = ^ P(xj - Uj, yj - Vi)fj + 5j (1) 

i 

where Sj is an additional noise contribution. Thus the entire 
image d can be described as: 

d = P(AX, AY)f + S, (2) 

where AX and AY define the offset between pixels and 
sources. This is a linear equation of the form 

d = Af + <5. (3) 

Naturally our measures of the pixel intensities d will 
have an associated, and measurable, variance and possibly 
covariance between the pixels, which we define here as Nd = 
(55 T ). 

To derive the maximum likelihood solution, we write 
down the likelihood as the Gaussian probability function for 
the data given the flux densities: 

£(f) = p(d|f) 

cx |N d r 1/2 cxp|-i(d-d) T N d - 1 (d-d)| 

where we define d as the data resulting from a given set of 
flux densities f. Defining \ 2 — (d — d) T N _1 (d — d) we see 
that at the maximum of the likelihood we require \ 2 to be 
at a minimum. However it can be seen that d = Af, so 

X 2 = (d- Af) T N d - 1 (d- Af). 

Hence at the minimum 

= ^4- 
df 

= A T N d _1 Af- A T N d _1 d, 

so the maximum likelihood solution can be written as 

f = (A T N d " 1 A)- 1 A T N d " 1 d, (4) 

An equation which is familiar from maximum likeli- 
hood map-making for both sub-mm and CMB experiments 
(e.g. Tegmark 1997, Patanchon et al. 2008, Cantalupo et al. 
2010). 
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Table 1. HcrMES SDP Observations (Oliver et al. 2010a, 2010b). Size is approximate extent of region with uniform coverage. Repeats 
is total number of pairs of scans in both A and B directions. Sensitivity is that for a point source, ignoring confusion noise. S24 refers to 
existing MIPS observations in these fields. 



Field Name 




Size 


RA 


Dec 


Repeats 


S250 


•S24 








deg 


deg 




(mjy, 5cx) 


(/ijy, 95 per cent completeness) 


A2218 


9' 


' x 9' 


248.98 


66.22 


2 


2.5 


N/A 


GOODS-N 


30' 


' x 30' 


189.23 


62.24 


30 


4. 


50 


LH-North 


35' 


' x 35' 


161.50 


59.02 


7 


8. 


80 


FLS 


155' 


' x 135' 


258.97 


59.39 


2 


12.5 


400 


LH-SWIRE 


218' 


' x 218' 


162.00 


58.11 


2 


23. 


200 



3.2 Estimating the errors 

As this is a linear system the Fisher information matrix can 
be seen to be 

I = A T Nd _1 A < Nf _1 , (5) 

which, by the Cramer-Rao inequality, is the inverse of the 
lower limit of the covariance matrix of the source flux densi- 
ties (Nf ). Thus the covariance matrix is simply the inverse of 
the matrix A T N^" 1 A in Equation [4] Intuitively this makes 
sense, if there are no overlaps between the sources then X 
would be a diagonal matrix with each entry corresponding 
to PRF T N d _1 PRF i.e. £ Pi?.F 2 /(<5 2 }, where (&) is a S ain 
the noise in a pixel i. 

While we can solve Equation U for the flux densities 
/ via some fast iterative method, to get the variances we 
must invert I by 'brute-force'. However, the matrix is posi- 
tive symmetric and highly optimised inversion codes for this 
class of matrix exist. Here we invert the I directly using 
LAPACK/BLAS routines. 

One drawback to this approach is that we will always be 
limited to the lower limit of the covariance matrix, given the 
inequality presented in Equation [5] One alternative would 
be to use Monte Carlo Markov Chain (MCMC) methods 
to fully map the posterior probability distribution, allowing 
the true variance to be properly characterised, as will be 
discussed in Section O 

3.3 Background or other source terms 

We can consider other additive model contributions to the 
signal in an obvious way, by including extra terms in Equa- 
tion [2] and calculating the matrix A accordingly, with the 
vector f then representing all the model parameters. For 
example a constant flat background would be a single ex- 
tra element in the vector f with a corresponding row of 
(1, 1, 1, 1) in the matrix A. More complicated model back- 
grounds with more parameters can be included by adding 
extra terms to f and A. The ability to do this is particu- 
larly useful for our application to Herschel SPIRE imaging, 
as some astronomical flux is lost when removing the tele- 
scope background in the map making process for SPIRE 
imaging. 

4 OPTIMISING THE INPUT LIST 

The linear technique should return the optimal solution for 
a complete input list, containing the precise position of ev- 



ery source contributing flux to the map. In practice we can 
never have a precise input list, because some sources will 
be missing due to flux density limits or masking in the an- 
cillary data while some sources we include may in fact be 
spurious or emit no flux at the wavelength under investiga- 
tion. A further complication is that most sub-mm facilities 
such as Herschel are not designed as absolute flux measuring 
devices; the mean level is lost when removing the telescope 
background. Hence the zero point of the map does not cor- 
respond to zero flux density, but rather an unknown mean 
level, and the faintest sources will appear as fluctuations 
about this point. 

These issues become problematic at high source density, 
as degenerate solutions to the linear problem become more 
common. To highlight this consider the most extreme case, 
using deep optical catalogues as an input to Herschel SPIRE 
imaging. The number density of optical sources with B < 
28 is roue hly 10 6 deg" 2 (Furusawa et al. 2008); given that 
the SPIRE beam size at 250 ^m is ~ 3 x 10~ 5 deg 2 the 
expected number of optical sources per SPIRE beam is ~ 30. 
Of course not all of these are going to be luminous at 250 /im 
so we need some way of culling those sources which are too 
faint to be present in our maps. 

There are two clear approaches to reducing the input 
list. One method would be to consider properties of the input 
list, such as the probability of chance alignment given the 
number density of sources of that flux density (e.g. Downes 
et al. 1986, Lilly et al. 1999) or the likelihood that a par- 
ticular source would be sub-mm luminous given its multi- 
wavelengths properties (e.g. Pope et al. 2006, Yun et al. 
2008, Roseboom et al. 2009). 

An alternative is to let the sub-mm data discriminate. 
The matrix A in Equation [3] is essentially a model we are 
trying to fit to the data, with the number of free parameters 
equal to the number of input sources. However we need to 
consider the possibility there may be a better model which 
needs fewer free parameters (sources) to sufficiently describe 
the data. The use of model selection techniques such as this 
is common and often rely on criteria such as the Bayesian 
information criterion (BIC; Schwartz 1978) or the Akaike 
information criterion (AIC; Akaike 1974). 

Both approaches have advantages at different angular 
scales. The first approach, which tries to calculate the prob- 
ability of a chance superposition, is heavily biased towards 
bright counterparts, and ignores any possible correlations 
in the clustering at different wavelengths. However the lat- 
ter approach of letting the data discriminate will not give 
good results for heavily blended sources (i.e., source seper- 
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ation much less than the beam FWHM). Thus we want an 
approach which will incorporate the best elements of both 
techniques, as detailed in the following sections. 

4.1 Segmenting the map 

The biggest problem with implementing any model selec- 
tion approach to source detection and extraction is that in 
a nai've implementation the number of calculations required 
is 2^, where iV is the number of sources to be considered. 
Given that a typical 1 deg 2 map may contain many thou- 
sands of sources, some cost saving measures must be intro- 
duced. The first step is to segment the map into isolated 
regions in which sources may contribute significantly, but 
not affect other sources outside it. For this step we can re- 
use the Fisher information matrix calculated in Equation 31 
1 = A T N~ 1 A. The non-diagonal elements of I, i, j for j =^ i, 
describe the fractional contributions the source j makes to 
the noise-weighted, PRF convolved flux density in the map 
at the position of source i, i.e. Ylj-^-.ifj — m »j where- rrii is 
the flux density in a PRF convolved map at the position of 
source i, weighted by the pixel noise. Thus we isolate regions 
of blended sources by taking sources to be paired if 

Xijmj > 1, 

i.e. if the flux density contributed by one source to another 
is greater than the la pixel noise. 

One problem with this approach is that it assumes we 
already know the flux densities of the sources in the map. 
However, given that we are only trying to segment the map, 
it should suffice to use some initial estimate of the flux den- 
sities /. Here we choose to simply use the PRF convolved 
flux density at the position of source i, irrespective of its 
neighbouring sources, i.e /o = A T N~ 1 d/ldiz S where only 
the diagonal elements of I are considered. In this framework 
fo can be recognised as the upper limits to the flux densi- 
ties. Chains of connected sources are identified by starting 
at one source, and going through all the elements of its row 
in X, grouping the connected sources. After this first step it- 
erations of this same process are repeated on all the sources 
in the group, until the group does not continue to grow in 
size. 

4.2 Using model selection for source detection 

Once the map has been segmented into groups we can use 
model selection to decide which sources in each group are 
justified by the data. However, the number of calculations 
to be performed is still 2^, where N is now the group size. 
As discussed above, in heavily confused images the number 
density of input sources could be as high as 10 per beam 
element, resulting in a very large number of calculations 
to be performed. As an alternative we adopt an iterative 
'top-down' approach in which we jackknife the input list 
i.e. consider all the models which have N — 1 sources, and 
select the best model. The process is then repeated with 
N — 2, N—3, N — s sources, until a better model cannot be 
found. Models are compared using the Akaike Information 
Criteria, corrected for finite sample sizes (AICc); 

AICc = 2k - 2 ln(£) + 2k ( k + 1 ) 
ti — k — 1 



Where k is the number of parameters, £ is the likelihood, 
and n is the total number of data points used in the fit. 
While the BIC could have been used, here we choose the AIC 
as it penalises extra parameters less harshly than the BIC. 
Since our parameters are actual known sources (as opposed 
to simply free parameters in a model) we have good reason 
to believe they should be included unless there is evidence 
to the contrary. For our source fitting — 21n(£) = \ 2 model 
so the AICc becomes: 

AICc^ + 2fc+ H^_til, 

where \ 2 is calculated on the fit to the map segment. Since 
we are fitting in source space, n here is the original number 
of sources considered and in the first step k = n. 



4.3 Weighting the input list 

A number of well-established probabilistic approaches ex- 
ist for weighting identifications of low resolution, sub-mm 
sources. We require something more, because the traditional 
techniques require a source detection stage, which is absent 
from our methodology. What we wish to know is the like- 
lihood that a particular input source is luminous (or more 
practically, detectable) in the sub-mm band of interest. One 
way to do this would be to consider the existing full multi- 
wavelength dataset for each input source, and predict the 
sub-mm flux density and its variance from the full range 
of plausible SEDs. While this would in principle return the 
best results, implementation of such an approach would be 
difficult and give mixed success due to the heterogeneous 
nature of most multi-wavelength data sets. 

A simpler alternative is to weight the models by how 
likely they would be to appear by chance, i.e. what is the 
likelihood that a source is a random superposition? This 
approach is analagous to the 'p statistic' analysis (Downes 
et al. 1986); however, in our implementation we do not have 
positions for our sub-mm sources and hence cannot work 
out the probability of finding a counterpart within a given 
search radius and separation. 

Since the AIC offers a relative comparison of models, 
the absolute likelihood here is not important, thus we intro- 
duce a more nai've, but useful, estimate of the probability 
of a chance alignment. For a given source i we calculate the 
probability 4>i of finding a source in the input catalogue with 
flux density F greater than the source under consideration 
Fi within an area of one beam element A = ttFWHM 2 /4 In 2: 

4>i = A/f>f ; a. 

Where M/> f in is the number density of sources present 
in the input list with F > Fi. We add this probability to 
the model selection stage and hence the AICc calculated for 
each model becomes; 

AICc = X 2 + 2fc + 2 x In g>) + 

where the ^ <j>i runs over all of the sources which are as- 
signed zero flux density via the model testing or the fitting 
process itself. 
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5 THE HERMES CROSS-IDENTIFICATION 
ALGORITHM 

As HerMES will identify > 200,000 sources in the Herschel 
SPIRE bands across all of our survey fields (Oliver et al. 
2010a), we need an algorithmic, machine-based, approach 
to producing cross-identifications (XIDs) across the many 
data sets present in our fields. To achieve this we utilise an 
implementation of the method described above. One of the 
key features of HerMES is that all of the planned survey 
fields contain existing Spitzer data from a range of legacy 
surveys. More importantly the tiered nature of HerMES is 
well matched to the variable quality of the Spitzer data, 
in particular the MIPS 24 /im observations. This is high- 
lighted by comparing the S250 an d S24 sensitivities in Table 
[1] With the exception of FLS, all of the SDP observations 
have a limiting S250 / S24 colour of ~ 100. Using a compila- 
tion of pre-Herschel emphirical models (Fernadez-Conde et 
al. 2008; Le Borgne et al 2009; Franceschini et al., in prep; 
Pearson et al., in prop; Valiante et al. 2010; Xu et al. 2001) 
we estimate that 0.4-24 per cent of S250 > 1 mJy sources 
have S250/S24 > 100, with the majority of these (up to 70 
per cent) lying in the range 1.2 < z < 1.6 where the 24 /im 
band is coincident with the 10 /im silicate feature present in 
strong absorption in typical starburst galaxies. 

It is also clear from existing measurements of the cos- 
mic IR background (CIRB) from BLAST (Devlin et al. 2009) 
that sources already detected at 24 pirn with Spitzer are the 
dominant contributor at these wavelengths. In particular, 
Pascale et al. (2009) show that greater than 90 per cent of 
the CIRB at BLAST/SPIRE wavelengths can be accounted 
for by 24 /xm sources with 5*24 ^ 100 /iJy. Hence we can be 
confident that using the 24 fim source lists as a model for 
the positions of sources in the SPIRE maps is appropriate. 
It is also worth considering that in the deepest fields (i.e. 
GOODS-N) the source density of 24 ^m sources is ~ 24,000 
deg" 2 , or ~ 2 SPIRE 250 pm beam elements per source. 
Thus even recovering the SPIRE fluxes for the detected 
24 (jm sources, involves going significantly beyond the con- 
fusion limit. 

The full algorithm used to produce the XID cata- 
logue for the SDP observations is described below. While 
it would be possible to include Gaussian priors on the 
SPIRE flux densities, at this stage we do not understand 
the SEDs of our SPIRE sources well enough to accurately 
predict the range of SPIRE flux densities from the existing 
Spitzer and short wavelength data (i.e. Rowan-Robinson et 
al. 2010). Hence only the simple non-negative flux density 
prior, Sspire ^ 0, has been implemented. This is achieved 
by using the Bounded Variable Least Squares (BVLS) algo- 
rithm described by Stark & Parker (1995) to perform the 
matrix inversion. 

It should be noted that this algorithm has been devel- 
oped in parallel with the other data reduction techniques 
(i.e. Smith et al. 2010; Levenson et al. 2010) for use in the 
first SDP science papers from HerMES. Thus while this ap- 
proach has proven to give the best performance under test- 
ing, it is clear that several aspects could be easily improved. 
However, to maintain consistency with the results presented 
in other HerMES SDP papers we only consider our original 
algorithm in the following. A description of problematic as- 



pects of this approach, and how they may be improved in 
future applications, is presented in Section [9] 

5.1 Step-by-step description of HerMES XID 
algoritm 

For the 250 fj,m band we follow these specific steps: 

1) Produce input list from available 24 (jm source cata- 
logues. Sources are considered if they are detected at 5a in 
the MIPS 24 /jm imaging, and if they are above a given flux 
density limit: 20 [iJy for GOODS-N, 50 ^Jy for LH-North, 
and 200 ^Jy for FLS and LH-SWIRE 

2) Calculate the matrices needed for the inversion method 
using the input list, PRF model, and SPIRE 250 /im map 
and variance (i.e. A, N and d from Equation [4| . 

3) Generate the matrix X = A T N~ X A and m = A T N _1 d. 

4) Segment the map using information contained in X and 
m. Segments are produced by weighing the contribution 
from source blending against the instrumental noise, as de- 
scribed in Secti0n r4.ll In practice this method produces seg- 
ments too large to be solved in a reasonable time, thus we 
add an extra factor, Ad, to the instrumental noise in quadra- 
ture. For the catalogues described here, Ad = 1 mJy in all 
cases. Thus sources are segmented into groupings where no 
external sources contribute more than (crf nst + A 2 ;) 1 / 2 to 
any given source within the segment. In practice this extra 
term is thought to be less than the errors introduced by 
the unknown background (2-3 mJy), and the incomplete- 
ness of the input list, characterised by the surface brightness 
of sources undetected at 24 fi (2-3 mjy/beam, predicted by 
FC08 mocks) and hence has a negligible effect on the quality 
of the output catalogues. A given source is allocated to ex- 
actly one segment, such that the algorithm returns a single 
estimate of the flux density for each input object. Typical 
segments are 10-50 sources in size for our deepest fields, 
with a maxmium size of ~ 200. 

5) For each segment: 

5.1) Build the smaller X' and m' for this segment from X 
and m. 

5.2) Build the noise weighted mini-map of the segment re- 
gion d' = dN^ 1 . 

5.2) Add a local flat background under the segment to X' 
and m'. The response of the background is taken to be 
1/M' src , where M' Brc is the number of sources in the seg- 
ment. It is necessary to fit this background to recover some 
of the astronomical flux lost to the telescope background 
in the map-making process. Note that we do not allow this 
parameter to be removed by the model selection stage. 

5.3) Solve X'f = m' for source flux densities f ' using BVLS 

5.4) Calculate the initial \ 2 and AICc from the solution in 
5.3 and mini- map d'. 

5.5) Iteratively search for the minimum AICc, starting 
with i = 1: 

5.5.1) Fit the segment with all M s ' rc — i combinations; 

5.5.2) Measure \ 2 an d AICc values for each combination; 

5.5.3) From the set of M s ' rc — i AICc values, identify the 
minimum; 

5.5.4) If min[AICc(Af s ' rc -i)]<min[AICc(M s ' rc -i + l)] 
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then remove the source corresponding to that model from 
consideration, increment i, and go to step 5.5.1. Otherwise 
go to step 5.6. 

5.6) Calculate the lower limit to the covariance matrix for 
the sources in this segment by directly inverting I' using 
LAPACK/BLAS routines. X' at this stage contains only the 
sources which have not been removed by the model selection 
stage (5.5). 

5.7) Use the covariance matrix to find the maximum abso- 
lute value of the Pearson correlation coefficient^ for sources 
in the segment. This can be used later to identify heavily 
blended sources which cannot be recovered by this method. 

6) Write out the measured quantities (flux density, error, 
background for each segment, \ 2 an d correlation). 

5.2 Initial results from processing HerMES 
observations 

The HerMES XID algorithm has been used to produce cata- 
logues in each of the SDP fields, with the exception of Abell 
2218. The typical fraction of sources removed by the model 
selection stage is 20-40 per cent, although this is strongly 
dependent on the depth of the 24 /im input list. Rare cases 
do occur where all sources are retained or only 1 source is 
retained in a segment. To illustrate this, in the GOODS-N 
field 47 per cent of input sources are rejected by the model 
selection. These sources have a median 24 /jm flux density 
of 64 /ijy while only 5 per cent have a 24 /jm flux density 
greater than 150 /ijy. By contrast, 21 per cent of the input 
sources in LH-S WIRE are rejected, with a median 24 /im 
flux density of 260 /iJy and 5 per cent being greater than 
700 /LtJy. 

In order to achieve consistency between the 3 SPIRE 
bands we only carry out the model selection stage of the 
algorithm for the 250 /im band. An alternative approach, 
where all three SPIRE bands are treated independently, was 
initially considered, but found to give poor results. In par- 
ticular the increase in beam size from 250 to 350 to 500 fim 
results in a decreased ability to deblend at long wavelengths 
and a preference to retain fewer sources. This naturally 
leads to inconsistencies between the measurements in the 
different bands. Thus it was decided to use the 250 /im re- 
sults to determine which sources were indeed present at the 
SPIRE bands. One downfall of this approach is that some 
faint sources will be missed, where the observed-frame SED 
peaks longward of 250 /im. However, it was decided that 
these 250 fim faint sources would be too hard to recover 
reliably at this stage. Again we can use the pre-Herschel 
mock catalogues of Fernandez-Conde et al. (2008) to es- 
timate the number of 350 and 500 /im sources missed by 
this requirement. Assuming a uniform sensitivity across the 
SPIRE bands, and the depth of the deepest field considered 
here (GOODS-N; 4 mjy), the number of sources the addi- 
tional incompleteness due to requiring a detection at 250 
/xm is only 0.5 per cent. However it is clear that this esti- 
mate is highly dependant on the range of SEDs used in the 
Fernandez-Conde et al. models. 

Ideally the model selection would be performed over 

2 Pearson correlation coefficient is: r = covi j / 'uitjj 



all three bands concurrently, such that evidence in any one 
band for a particular source would cause it to preferred by 
the model. However, it was not possible to implement this 
approach in time for the SDP papers. 

Of course even with these additions we are still limited 
to those sources which are detected at 24 /im. An additional 
step to find entirely new sources in the residual maps, using 
the AICc to determine their significance, would rectify this. 
Again it was not possible to implement such a feature in 
time for the SDP papers. 

For the 350 and 500 /im bands only sources which are 
found to have 5*250 > lc are considered. The flux densities 
for the 350 and 500 /im sources are then measured using 
steps 1-5.4 and 6 only. 

XID catalogues have been produced in this way for the 
SDP fields described in Table[T] As an input to the algorithm 
we take the 24 /im source catalogues described in Section [2] 
and the known PRF. Testing on bright point sources has 
shown that the PRF can be adequetely described as a 2D 
Gaussian with FWHM=18.15, 25.15 and 36.3 arcsec, for the 
250, 350 and 500 /mi bands, respectively. 

While the input source list is defined by the 24 fim flux 
density limits, we use source positions from Spttzer IRAC 
3.6 /im imaging where there is deep co-incident data and 
previous associations between the two data sets have been 
made. This occurs in all of our fields, with the exception of 
LH-North, and the wider area of HDF-N. The IRAC posi- 
tional accuracy is typically ~ 0.2 arcsec (as opposed to ~ 1 
arcsec for 24 /im) and hence using these eliminates any er- 
ror in the flux density solutions introduced by astrometric 
errors. 

The resulting HerMES XID catalogues contain the com- 
plete input 24 /im source catalogue, as well as any previously 
associated data sets at other wavelengths (see Vaccari et al. 
2010, in prep.), as well as the best estimate of the SPIRE 
flux density for each 24 /im source passing our input selec- 
tion criteria. 

In addition to the flux density and error in each band for 
each input 24 /im source, the SPIRE component of the XID 
catalogues contain a number of extra columns describing 
diagnostics of the fitting process and local source confusion. 
These extra measures include: 

• Maximum absolute value of the Pearson correlation co- 
efficient calculated on the covariance matrix of the flux den- 
sity solution (hereafter refered to as p). 

• x 2 °f the source solution in the neighbourhood of source 
(7 pixel radius). 

• The background level estimated in the fitting. 

• The number of sources in the segment containing this 
source. 

• The ID number of the segment. 

• The PRF-smoothed flux density at the position of this 
source, ignoring contributions from neighbouring sources 
and the background. 

• The number of 24 /im sources within a radius of the 
FWHM with greater than 50 per cent of the flux density of 
this source. 

• The 'purity' of the SPIRE flux density, based on the 
ratio of this source's 24 /im flux density to the 24 /im flux 
density smoothed with the SPIRE PRF at this position (see 
Brisbin et al. 2010). 
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The reason for including these extra columns is to en- 
able samples of varying quality to be extracted from the XID 
catalogues based on differing scientific requirements. From 
an early assessment of the XID algorithm performance the 
recommended quality cuts for typical science applications 
were: 

• Sx > 5 x ASa; 

• p< 0.8; 
• 



X 2 <5. 



Given the more detailed analysis presented below, these 
cuts have proven to return very reliable samples, although 
possibly at the expense of completeness. Hence they rep- 
resent fairly conservative guidelines for the use of the XID 
catalogues. 



6 TESTING ON SIMULATIONS 

To quantify the effectiveness of these new techniques we con- 
sider simulated SPIRE images. Here we consider two simu- 
lated cases: a 'deep' map, where cr con f 3S> 0"inst; and a 'shal- 
low' map where <7i ns t ^ cr C onf- ^ n eacn case we simulate a 
2.2° x2.2° patch of sky in all three SPIRE bands, taking 
the mock catalogues of Fernandez- Conde et al. (2008, hence- 
forth FC08) as an input. While many mock catalogues exist 
at these wavelengths, the FC08 mocks were found to give 
the best match to the observed confusion noise and source 
colours in real SPIRE data. One key feature of the FC08 
mock catalogues is that they incorporate a prescription for 
the clustering of sources (albeit flux and SED independent). 
This characteristic is of particular importance as the cluster- 
ing introduces correlations between the resolved sources and 
the confusing background of the sort present in the real data. 
Additionally the FC08 mocks incorporate a semi-realistic 
range of SED types, and their evolution, based on a combi- 
nation of detailed modelling of local sources, and constraints 
placed by pre-Herschel number counts at Spitzer, ISO and 
SCUBA wavelengths. 

Simulated maps are produced from the positions and 
flux densities quoted in the mock catalogues by first mak- 
ing noise-free maps in each band, using the known SPIRE 
PRF parameters. Secondly, Gaussian noise and a flat back- 
ground are added. To give the best possible correspondence 
to the real observations, this second step is repeated, vary- 
ing the Gaussian noise and background, until the best match 
to the P(D) in the observed SPIRE maps is found. For the 
deep scenario we match the observations in our GOODS- 
N observations, while for the shallow simulation we match 
to observations in LH-SWIRE. Given the confusion noise at 
SPIRE bands is known to be ~5-7 mjy (Nguyen et al. 2010, 
Smith et al. 2010), these scenarios represent the confusion- 
noise-dominated, and instrument-noise-dominated cases, re- 
spectively. 

Figure [1] compares the P(D) distributions for HerMES 
SPIRE observations in GOODS-N and LH-SWIRE to cor- 
responding simulations. Table [5] lists the background and 
Gaussian noise added to each pixel in the simulated map in 
order to match the observations. 

A mock 24 /im input catalogue is produced by cutting 
the FC08 simulation at a level representative of the qual- 
ity of the 24 data in our observed fields: S24 > 50/iJy 
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Figure 1. Normalised distribution of intensity values in each 
pixel {P(D)) for the SPIRE observations in GOODS-N (left; 
dashed), LH-SWIRE (right;dashed) and our deep and shallow 
simulated maps (left and right respectively; solid lines), based on 
Fernandez-Conde et al. (2008). 



for the deep simulation, and S24 > 200piJy for the shallow 
simulation. In addition to these flux density limits some real- 
istic limitations on source confusion in 24 /im detected source 
lists are imposed. As the beam size of MIPS 24 fim imaging 
is 6 arcsec very few sources appear with seperations of < 3 
arcsec, and those which do quite often turn out to be unre- 
liable. Thus mock source pairs which are seperated by less 
than half of the Spitzer MIPS 24 fim beam (3 arcsec) are 
filtered, with preference given to the brighter source. Ad- 
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ditionally, sources within the Airy profile of bright 24 pm 
sources are also removed. The first Airy ring of the MIPS 
24 fim PRF has a peak level of ~10 per cent of the peak of 
the PRF0. Hence we filter pairs of sources on the scale of 
8 arcsec, with flux density ratios of greater than 10, again 
giving preference to the brighter source. 

Simulated XID catalogues are produced using the Her- 
MES XID algorithm outlined above (herefter refered to as 
Method A). We compare our approach with two previously 
adopted XID methods for far-IR datasets using the same 
simulations: A catalogue-space method, using a combina- 
tion of the Sussextractor algorithm and p-statistic match- 
ing (Method B), and a variant of existing linear inversion 
methods, based on that presented in Bethermin et al. (2010) 
(Method C). For method C we filter pairs in the 24 pm input 
catalogue at seperations of less than 20 arcsec, giving pref- 
erence to the brighter source at 24 /jm, as per Bethermin et 
al. (2010). In addition for method C we make use of a conju- 
gate gradient method with no flux density priors to perform 
the inversion, as opposed to the BVLS method with a non- 
negative prior described above. 

The performance of source extraction and cross- 
identification methods are typically characterised by two 
metrics: the completeness, i.e. the fraction of sources recov- 
ered at a given flux density; and the reliability or mis-ID 
rate. While the notion of completeness translates well to the 
methods presented here, reliability is not an intuitively use- 
ful quantity when performing XIDs in the map-space. We 
know (or assume) that all of our 24 pm sources are reliable; 
the aim is solely to accurately measure their flux densities 
at other wavelengths. Thus the second metric by which we 
judge our XID methods is flux density accuracy. 

In constructing the simulated catalogues for all three 
methods we need to make some XID and flux density quality 
cuts. 

For the method A we select all sources from the output 
catalogues. Additionally we define a second sample using the 
X 2 < 5 and p < 0.8 selection thresholds described in Section 
[5] To emphasise the effect these additional cuts have on the 
completeness and flux density accuracy we denote the use 
of these additional quality cuts Method A'. 

For method B we take all sources in the Sussextractor 
output lists and try to find matches in the mock 24 /im cat- 
alogue within a search radius of 10 arcsec, 14 arcsec and 
20 arcsec for the three SPIRE bands, respectively. For all 
sources within the search radius we calculate the p-statistic 
of the match using the formula of Downes et al. (1986), tak- 
ing those with p < 0.1 to be possible counterparts. Cases are 
excluded where there are multiple counterparts with p < 0.1 
for a single detected source. 

For method C we take all sources in the output cata- 
logue. 

Figures [2] and [3] present the completeness and flux den- 
sity accuracy for the three methods. In both Figures com- 
pleteness is defined as the fraction of sources in the output 
catalogues recovered at 5er significance and satisfying the 
above conditions, to the total number of sources in the orig- 
inal FC08 mocks. By contrast the flux density accuracy is 
measured across all recovered sources, irregardless of signif- 

3 see http : //ssc . spitzer . caltech . edu/mips/mipsinstrumentha 



icance, although it should be noted that for a source to ap- 
pear in the Sussextractor list it must be detected by that al- 
gorithm at > 3<r. A summary of the key statistics is also pre- 
sented in Table [3] Both map-based methods (A and C) have 
higher completeness at faint flux densities at all wavelengths. 
It should be noted that for the method B it is the require- 
ment that an ID be 'secure' that forces the completeness 
to be low. As shown in Smith et al. (2010), the complete- 
ness of the Sussextractor-alone catalogues is comparable to 
that achieved by the HerMES XID algorithm (method A). 
In the deep simulations, the p < 0.8 cut imposed in method 
A' has a similarly dramatic effect on completeness in the 
350 and 500 pm bands. This is primarily due to the lower 
resolution and 250 /xm-based input list used in the longer 
wavelength bands. As no model selection stage is performed 
on the 350 and 500 pm images we are attempting to fit more 
sources than can be resolved in the map, leading to strong 
degeneracies between close pairs. This is understandable, as 
the typical separation between input sources to the 350 and 
500 pm map is 1.5 and 1 pixel (15 arcsec), respectively. It 
should be noted that similar degeneracies in the simple lin- 
ear inversion are removed by the initial spatial filtering of 
the input list. 

The completeness of the method A (and A') is not con- 
sistently better than the other methods; which can even 
be superior at bright flux densities in the long wavelength 
bands. However, the flux density accuracy of the Hermes 
XID algorithm is consistently better. This is most striking 
in the deep simulation, where the flux density accuracy of 
the method A, and in particular method A', is not only 
better, but has significantly fewer sources which have been 
boosted to erroneously high flux densities. 

Another feature which is clear from Figures [2] and [3] is 
that the mean flux density error is always negative for the 
linear methods, i.e. method A and C systematically under- 
estimate the flux density of sources. The mean S b a — Strue 
for each method and simulation is given in Table [3] The ori- 
gin of these negative offsets can be attributed to fact the 
maps have a mean of zero, whereas we know that there is 
an unresolved background of sources contributing to each 
pixel (or beam) in the maps. For traditional source detection 
and extraction methods this is preferable, as fluctuations in 
the confusing background appear as quasi-symmetric noise 
about zero, and hence can be treated as another pseudo- 
Gaussian noise term. 

However for our Method A and C the number density 
of our input list is much higher than could not be identi- 
fied in the map blindly. Hence we are attempting to 're- 
solve' some of the confusing background, which is made up 
from the contributions of many faint unresolved sources, into 
source flux. This is why this feature does not appear in the 
results for Method B. While the local background fitting 
added to Method A goes some ways to alleviating this it 
is clear that this approach is not completely effective. The 
other two methods (B and C) do not consider any non-zero 
background. Simultaneously fitting a solid background un- 
der the entire map would almost certainly resolve this issue, 
however it is not computationally feasible to solve for more 
than a few hundred sources at once, and hence this is not 
currently possible. An alternative is to iteratively solve for 
the background, i.e. fit, and remove, all the known sources in 
Ithe map (considering no background) and then calculate the 
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Table 4. Best fit Gaussian parameters to the normalised flux 
density error distributions shown in Figure [4] and equivalent dis- 
tributions for the shallow simulation. All measurements in units 
of the estimated error c cata i ogue 
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mean of the residual map. After one pass this will not give 
an accurate estimate of the true background, as the source 
fluxe densities will be underestimated and hence some flux 
will remain from known sources. However if we repeat this 
process a number of times, until the mean of the residual 
map converges, this will give an accurate estimate of the 
background due to unknown confusing sources. An approach 
similar to this will likely by used in the next iteration of the 
HerMES XID algorithm. 

While high flux density accuracy and completeness are 
a key aim for any XID method, it is also vital that our 
proposed method return reliable estimates of the flux den- 
sity error, as for real applications we will not have knowl- 
edge of the true flux density of our sources. In Figure 0] 
we show the distribution of observed flux density error (i.e. 
Sobs-Struc), normalised by the error estimated by the pho- 
tometric pipeline for the deep simulation. The first obvious 
feature of these distributions, as previously discussed in ref- 
erence to Figures [2] and [3l is that the peak in flux density 
error distribution is always negative. One side-effect of this 
systematic negative offset is that it makes the definition of 
the catastrophic failure rate problematic, if we simply take 
the number of sources which have absfS'oba-S'true], greater 
than 3(7 C ataiogue then a very large fraction of sources will be 
considered failures. Thus we take an alternative approach, 
as we ultimately want to treat our flux densities errors as 
Gaussian, it makes sense to fit the distributions shown in 
Figure [4] with a Gaussian, considering the amplitude, mean 
and sigma as free parameters. Table U describes the param- 
eters of the best fit Gaussian to each of the distributions 
shown in Figure [4] 

It is clear that for deep observations the quoted cat- 
alogue error from the HerMES XID method (Method A) 
underestimates the true error by a factor of at worst ~ 5. 
This is consistent with the values quoted in Table as the 
typical catalogue error estimated for the deep simulated cat- 
alogues is 0.9, 0.7 and 1.5 mjy for the 250, 350 and 500 /im 
bands, respectively. Fortunately the situation is much bet- 
ter once we reach the level of the shallow simulations, where 
the catalogue errors are consistently within a factor of 2 of 
the true error. The underlying reason for this discrepency 
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Figure 4. Distribution of observed flux density error, normalised 
by ""catalogue i f° r the deep simulation. 



between the true errors and those estimated from the data 
alone is not completely clear. One possible origin is erro- 
neous fluctuations in the background, which could be elim- 
inated by requiring a smooth background across the entire 
image, rather than fitting local backgrounds. Another fac- 
tor will be the incompleteness of the input lists, due to the 
24 /im flux limit. One puzzling feature is the large variation 
between the bands. It is worth noting that this variation is 
quite similar to the variation in input Gaussian noise to the 
simulations, as quoted in Table [2] This is suggestive of a 
hard limit to the flux density error, either from the factors 
listed above, or simply noise introduced from the deblending 
of confused faint sources. 

One thing which is clear is that the other potential XID 
methods are significantly worse at accurately estimating the 
flux density error. While Method A shows a very Gaussian 




Figure 2. Deep simulation results for completeness (top) and flux density error (bottom). Completeness is for 5o" cata i ogue sources only, 
while flux density error is measured for all objects in the resulting catalogues, irregardless of significance. Results are shown for the three 
SPIRE bands; 250 pm (left), 350 pm (middle) and 500 /im (right) and the three XID algorithms considered: Method A (HerMES XID 
algorithm; red dashed line); Method A' (HerMES XID algorithm with p < 0.8 quality cut; red solid line); Method B (Sussextractor+p- 
stat; black); Method C (simple linear inversion method; blue). Lines in bottom panel represent the median flux density error for each 
band/method, while the error bars are the la and 3cr variation. Both map based, linear inversion methods (methods A and C) are seen to 
outperform the catalogue based method at faint flux densities. The low completeness of the HerMES XID algorithm at 350 and 500 pm 
can be attributed to the p < 0.8 cut. 



Table 3. Summary of completeness and flux density accuracy for XID methods. In measuring the completeness we consider only sources 
which are detected at 5o- cata i oguo and pass the quality control thresholds discussed in the text. For the flux density accuracy all sources 
which are returned by each method are considered. All values in mjy. Catastrophic failures are defined as those that are outside of the 
3<r range of the best fit Gaussian to the distributions shown in Figure [4] 
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distribution of normalised flux densities errors, the other 
methods have a long tail to very large values. To quantify 
this we define the catastrophic failure rate as the fraction 
of sources which appear at abs[5 , bs-5'true]> 3 x <Jfu, where 
a fu is the best fit value derived for a specific SPIRE band 
and method in Table [4] At 250 fim it is clear that Method A 
returns a highly Gaussian error distribution, with only ~ 2 
per cent falling outside the 3a fu range. The other meth- 
ods have a much higher catastrophic failure rate at 250 fim, 
approaching ~ 10 per cent. At the other SPIRE bands, the 
350 fim distributions are well described by a Gaussian for all 
methods, but at 500 fim it appears that there are a signifi- 



cant fraction of catastrophic failures produced by Method A. 
While these failures are still quite reliable compared to the 
very large errors returned by the other methods, it is worth 
commenting on this non-Gaussian element to the distribu- 
tion. This is likely an artifact of the model selection being 
performed at 250 fim only, as high redshift 500 fim 'peaking' 
sources will appear faintly in the 250 fim maps and hence 
are likely to be missing from the input list at 250 fim. In 
these cases the 500 fim flux will be erroneously assigned to 
the neighbouring 250 fim bright source. 

Interestingly the flux density errors for Method A and 
A' are in good agreement with the measured confusion noise 
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Figure 3. Shallow simulation results for completeness (top) and flux density error (bottom). Completeness is for 5o- cata i oguc sources 
only, while flux density error is measured for all objects in the resulting catalogues, irregardless of significance. Lines as per Figure [2] 
Again both map based, linear inversion methods (methods A and C) are seen to outperform the catalogue based method at faint flux 
densities. 



limit from Nguyen et al. (2010). Thus it is clear that our 
method is able to probe flux densities close to, if not below, 
the confusion noise. This is particularly noteworthy when 
considering that the systematic negative offset in the flux 
densities, due to issues with the background fitting, is a large 
contributer to this noise. 

As one of the key science goals of SPIRE surveys will be 
investigations of far-IR SEDs and their evolution, we need 
to understand not only the quality of the monochromatic 
SPIRE flux densities, but also any correlations between the 
bands. To investigate this in our simulated data set we look 
for correlated errors in the flux density accuracy of the deep 
simulation results. 

Figure[S]shows the relationship between the flux density 
error (t^obs — 5truc ) in the three bands for the deep simula- 
tions. It is clear that the flux density errors show a strong 
linear correlation. Quantifying these correlations with the 
Pearson correlation coefficient (r) shows that the 250 to 
350 /im and 350 to 500 /im flux density errors are strongly 
correlated (r ~ 0.8), while the 250 to 500 ^im flux density 
errors show somewhat weaker correlations (r = 0.5). Per- 
forming similar tests on the shallow simulation and other 
XID methods gives similarly strong correlations. 

Although the peculiarities of the XID algorithms could 
be partially responsible for these correlations, the under- 
lying origin must be the effect of unknown, or poorly de- 
blended, close neighbours. While our method is designed to 
optimally deblend sources in the input list, this can never 
be perfectly achieved without perfect input lits. Given this 
it is unlikely that modifications can be made to the XID 
algorithm to remove these correlations. One thing to note 
is that the correlations are dependent not only on the areal 
density of sources, but also on the far-IR colours. Since the 
FC08 mock skies include only a limited range of SED types 
and SED/flux density independent clustering it is reason- 



able to assume that the amplitude of these correlations will 
be weaker in the real data. 

Finally, while in the simulations described above the 
input positions here have no astrometric errors in real ap- 
plications the input lists and SPIRE images will have errors 
on the order of ~ 0.1 — 0.5 arcsec. Thus it is worth consid- 
ering the effects of astrometric distortions on our simulated 
dataset. To achieve this normally distributed random astro- 
metric errors are added to the input positions and the XID 
process repeated. Here we only consider the HerMES XID 
algorithm (Method A). Figure |S] shows the result of adding 
errors on the scale of 0.1-10 arcsec to our input list. It can 
be seen that the accuracy of the flux density estimates is 
insenstive to astrometric errors of < 1 — 2 arcsec. 



7 TESTING ON REAL DATA 

While it is useful to assess the completeness and flux den- 
sity accuracy of our method on totally artificial maps, we can 
also calculate these metrics for the real data by injection of 
mock sources into our observed maps. This has the advan- 
tage of reproducing the true noise properties of the data, as 
well as highlighting the confusion noise in the presence of 
angular clustering. 

As our maps are already heavily affected by confusion, 
we only inject one source at a time into the map, and then 
run the XID source extraction algorithm, taking the input 
position of the mock source and the neighbouring 24 ^m 
sources into account. For each SDP field we inject mock 
sources with flux densities in the range 3-200 mJy at random 
positions. Test positions outside of the 24 /im coverage are 
not considered. To maintain consistency with the properties 
of the real 24fj,m input catalogues, test positions within 3 
arcsec of an existing 24 /im source are also excluded, as was 
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Figure 5. Correlations in the flux density errors found in the deep simulations. All 5<r sources are shown in grey (black contours), while 
those sources that also have p < 0.8 are shown in pink (red contours). The Pearson correlation coefficient (r) is quoted in the top left 
corner of each panel. 



done with the fully articifial simulations. As a result the total 
number of test positions is ~ 3000-5000 per field, with 300- 
500 per test flux density. Figure [7] shows the completeness 
and flux accuracy determined by this method. 

It can be seen that the completeness never reaches 100 
per cent in any field. The values rise sharply from faint flux 
densities and then plateau at a quasi-constant value above a 
certain flux density level. This is due to the effect of the p < 
0.8 criteria. Somewhat counter-intuitively, this is a bigger 
problem in the fields with deeper SPIRE/MIPS 24itm data. 



The reason for this is simple; the input source density is 
much higher in the deep fields and, as we assume no prior on 
the SPIRE flux density, this affects all flux densities equally. 
If the p < 0.8 criteria is removed the residual ~ 20-50 per 
cent incompleteness in the deep fields is recovered, but at the 
expense of flux density accuracy. For sources with p < 0.8 in 
the GOODS-N field the la flux density error is 4.24, 5.23 and 
5.64 mJy for the 250, 350 and 500 pm bands, respectively. 
For sources with p > 0.8 the comparable values are 6.3, 5.9, 
and 6.9 mJy, an increase of ~ 10-50 per cent. 
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Figure 7. Completeness (top) and flux density accuracy (bottom) determined by injection of mock sources into our observed maps. 
The completeness is defined as the ratio of the number of sources recovered at > 5<j and p < 0.8 to the number of input positions. Flux 
density accuracy is defined as the RMS of the input-output flux density. In calculating the recovered flux density accuracy all input 
positions recovered with p < 0.8 are considered. Sources are injected one at a time so as to avoid increasing the source confusion. In 
each panel the results for the 250 p,m (solid line), 350 fim (dot-dashed line) and 500 fim (dashed line) bands are shown. Mean flux density 
error for each band is shown in the top right corner of each of the lower panels. 



Encouragingly, the completeness and flux density ac- 
curacy derived from source injection agrees reasonably well 
with the numbers for comparable simulations (Table[3]). Two 
small exceptions to this are the completeness at 500 fim 
in the deep simulation/GOODS-N and the error in the 500 
fim flux density in the shallow simulation/Lockman-SWIRE. 
In the first instance the observed completeness in the real 
maps is slightly lower than that found in the simulations. 
The origin of this is not clear, but it is likely caused by 
slight differences in the input list. The simulations use a 
hard S24 > 50 /iJy cut, while the real GOODS-N catalogue 
is cut at S24 > 5cr, which includes many sources fainter than 



S24 = 50 fiJy and hence has a higher surface density, leading 
to more degenerate solutions. Another possible explanation 
is that real 500 /im sources are more strongly clustered than 
those in the simulation. The second issue, the difference in 
the flux density error at 500 /im between the shallow simula- 
tion and Lockman-SWIRE, likely originates from differences 
in how the 24 and 250 /im selection affects the real and simu- 
lated datasets. Specifically, larger errors would be expected 
if the simulations predict a higher level of incompleteness 
at 500 /im due to the 24 and/or 250 /im selections. These 
problems aside the otherwise good agreement reinforces the 
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Figure 6. Effect of astrometric errors on the flux density accuracy 
of the HerMES XID algorithm (Method A). Gaussian distributed 
distortions are added to the input positions of the deep simulation 
described in Table [2] It can be seen that the accuracy of the flux 
density estimates is insenstive to astrometric errors of < 1 — 2 
arcsec. 



notion that our simulations are a realistic recreation of the 
Herschel data. 

These results highlight the effectiveness of our method 
to recover faint sources in highly confused maps, Nguyen et 
al. (2010) estimate the confusion noise in SPIRE imaging to 
be 5.8, 6.3 and 6.8 mjy at 250, 350 and 500 /im, respectively. 
It is clear that in our deepest fields, where instrumental 
noise is insignificant we are able to go significantly below 
this limit. Taking the la flux density error quoted above for 
sources in GOODS-N with p < 0.8 it is clear our methods 
are able to reduce the effect of confusion noise by a factor 
of ~20-30 per cent. 

To investigate possible systematics in our photometry 
we compare the XID catalogues to those generated using a 
combination of source detection and extraction, via Sussex- 
tractor and p-statistic methods (i.e. method B from Section 
O, to match the resulting source lists with existing 24 /im 
catalogues. 

Sussextractor source lists are provided for each SPIRE 
band by SCAT (Smith et al. 2010). These source lists contain 
all SPIRE sources detected in the maps at a significance of 
greater than 3a. The monochromatic SPIRE source lists are 
then matched to the same 24 fim catalogues used as an in- 
put to XID algorithm. The matching is performed by finding 
potential counterparts within a search radius of 10 arcsec, 
14 arcsec, and 20 arcsec for the 250, 350 and 500 pm bands 



respectively. For each of these potential IDs we calculate the 
p-statistic. The uncertainty of the SPIRE position is calcu- 
lated using Equation B8 of Ivison et al. (2007). All IDs with 
p < 0.1 are considered. A complete sample is constructed 
by taking the best ID with p < 0.1 for each SPIRE source. 
Alternatively a 'clean' sample is constructed by taking only 
those cases where the separation is less than 0.6xFWHM, 
and there is only one potential ID with p < 0.1. 

Figure [S] compares the flux density estimates for sources 
in the LH-SWIRE, LH-North fields and GOODS-N from the 
XID catalogues and the SCAT+p-stat listings. Only those 
sources which are in common and are found at greater than 
5a in both catalogues are presented. The FLS field is omit- 
ted for clarity. While there is a large scatter between the 
two estimates for all sources, a good agreement can be seen 
for the 'clean' ones. The bulk of the sources which are dis- 
crepant between the two catalogues can be found above the 
one-to-one line in FigureJS] i.e. Sxid < fiscAT- This is a nat- 
ural consequence of the XID algorithm considering all known 
sources simultaneously, and thus deblending confused cases 
into their individual 24 /im detected components. 

As a final cross-check of the completeness estimates we 
compare the raw differential number density of sources found 
in both the XID and SCAT+p-stat catalogues to the best 
estimates of the source densities from Oliver et al. (2010b). 
Figure [9] shows the differential number density of sources in 
our XID and SCAT+p-stat catalogues, in the LH-SWIRE, 
FLS, and LH-North fields. GOODS-N observations are ex- 
cluded as the number of sources detected is too small for this 
comparison to be useful. Encouragingly, at bright flux den- 
sities (i.e. > 50 mjy), both the XID and SCAT+p-stat cata- 
logues show reasonable agreement with Oliver et al. (2010b), 
although cosmic variance introduces a large scatter at the 
highest flux densities. Both the XID and SCAT+p-stat are 
seen to be incomplete at faint flux densities, although in 
each band the XID catalogue is significantly more complete 
at flux densities ~ 20-30 mjy. Taking the Oliver et al. re- 
sult to represent the total number of sources, Table [5] quotes 
the XID and SCAT+p-stat catalogue 50 per cent complete- 
ness levels. These values are in good overall agreement with 
the completeness estimates found via simulations and source 
injection. 



8 THE EFFECT OF INCOMPLETE 24 /urn 
INPUT LISTS 

We have shown that the use of existing 24 fim source lists as 
a prior input to the source extraction process is beneficial in 
terms of flux density accuracy and completeness. However 
this methodology introduces a clear bias, in that we are 
restricted to only those sources which are sufficiently bright 
at both 24 /im and SPIRE wavelengths. 

One way to estimate this incompleteness is to again use 
the mock catalogues. Again turning to the FC08 mocks we 
can estimate the fraction of sources which would be present 
in our SPIRE images, but below the limit of the overlapping 
24 /im imaging. For the SPIRE bands we use the 50 per cent 
completeness limits quoted in Table O while for the 24 /im 
flux density limit we use the values quoted in Table[TJ As the 
GOODS-N field never reaches 50 per cent completeness we 
use the value from the deep simulation presented in Table 
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Table 5. Completeness estimates (50 per cent) for XID and SCAT+p-stat catalogues for real observations of SDP fields. Completeness 
is estimated via both injection of sources into the map, and by comparing the number density of sources in the resulting catalogues to 
the best estimate of the true source density from Oliver et al. (2010b). 
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[3] The fraction of sources missing due to the 24 /jm limit in 
the FC08 mock catalogues is given in Table [6] 

However relying on mock catalogues to describe this 
incompleteness is unsatisfactory, as it is very sensitive to the 
underlying SED distribution of sources, a known weakness 
of mock catalogues based on emphirical fits to the observed 
monochromatic number density of sources. 

In order to properly determine what additional incom- 
pleteness this introduces would require precise measure- 
ments of the bivariate number density, i.e. the areal number 
density of sources as a function of both 24 pm and SPIRE 
flux density. While that analysis is beyond the scope of this 
work, we can roughly estimate the lower limit to this incom- 
pleteness in our fields by making use of the multi-tiered na- 
ture of HerMES. Specifically we can use our observations in 
GOODS-N, which contains both the deepest SPIRE imag- 
ing and deepest 24 /im catalogues available, to determine 
the number of sources which would appear in the fields with 
shallower SPIRE data, if similar quality 24 /im input cata- 
logues were available. 

Figure [TO] shows the 24 pm vs SPIRE band flux den- 
sity for 5a sources observed in GOODS-N, while Figure [TT1 
shows our best estimate of the differential number density 
of SPIRE sources as a function of 24 /im flux density de- 
rived from this data. The densities have been corrected for 
incompleteness in the 24 pm input catalogue using the re- 
sults presented in Magnelli et al. (2009). We do not correct 
for SPIRE incompleteness as we wish to estimate how many 
sources are missing from our catalogues due to solely the 
24 pm flux limits. GOODS-N sources which are 5a in the 
relevant SPIRE band are considered, with no cut on p. To 
replicate the conditions found in our other fields we impose 
artificial SPIRE flux limits on the GOODS-N data. For each 
combination of SPIRE band and field we impose the 50 per 
cent completeness limit found via source injection quoted 
in Table [S] To find a robust estimate to the total number 
of sources missing from our shallower SPIRE observations 
we integrate Figure [TT] from zero to the quoted 24 pm limit 
given in Table [JJ Below S24, = 20pJy, or in cases where no 
sources are observed, we assume the differential density re- 
mains constant from zero to the last measured value. Table[6] 
summarises the results of these calculations. Given the very 
small area covered by GOODS-N, It should be noted that 
all of these values are subject to large uncertainties, espe- 
cially at 500 /im where the number of bright sources found 
in GOODS-N is very small. 

Encouragingly in both the LH-SWIRE and LH-North 



field at 250 and 350 /im we appear to only be missing an 
additional ~ 20 per cent of sources due to the 24 pm depth. 
The shallow nature of the 24 pm imaging in FLS means we 
are missing a significant number of sources in this field, al- 
though the bulk of these will be at relatively faint fluxes 
(< 30 mjy). At 500pm all of the fields potentially suffer 
from a high degree of additional incompleteness due to the 
24 pm limits. This is understandable, as the strong negative 
fc-correction with increasing redshift at 500 pm should result 
in a population of high-z 500 pm bright, 24 pm faint sources 
which would not be found via the methodology presented 
here. 



9 FUTURE WORK 

As discussed in Section [5] the algorithm and catalogues de- 
scribed here represent the first attempt to produce robust 
XIDs for SPIRE sources, and hence many avenues for im- 
provement are possible in terms of both flux density accu- 
racy and completeness. Some clear improvements have al- 
ready been discussed above. Specifically in these area: 

• Perform the model selection stage on all three SPIRE 
bands, and possibly other MIPS and PACS data, simulta- 
neously. 

• Introduce flux density priors based on SED fitting. 

• Improve the process of background estimation and re- 
moval 

• Use an iterative process to recover faint sources missing 
from our 24pm input list. 

• Obtain accurate estimates of the true errors on our flux 
densities. 

Of these the final one, accurate estimation of the errors, 
is arguably the most critical. It is clear from the accuracy 
metrics presented in Sections and that our flux density 
errors are underestimates of the true variance in our mea- 
surements. If the true variance, and covariance, of each flux 
density estimate could be obtained the use of crude 'flags' 
for selecting robust sources, such as the p, purity, and local 
24 pm source density, would no longer be necessary. 

One way to more accurately estimate the flux density 
errors would be to make use of Monte-Carlo Markov Chain 
(MCMC) methods to perform the linear inversion. This 
would have many advantages: a MCMC approach would 
map out the true posterior probability density for not only 
the source flux density variances, but also the covariance. 



Multi-wavelength identifications for HerMES 17 



Table 6. Upper limit to the incompleteness in our SDP fields due to 24 fim flux limits, based on analysis of FC08 mock catalogues, and 
SDP observations of GOODS-N. 
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Figure 8. Comparison of flux densities from XID catalogue to 
those from the SCAT SussExtractor-derived source catalogues in 
the Lockman Hole SWIRE, Lockman Hole North and GOODS- 
N fields. SCAT sources are matched to the 24 fim sources via 
the p-statistic. Sources present in both catalogues at 5cr are pre- 
sented, as well as a "clean" sample where p < 0.1, seperation< 
0.6xFWHMspire> an d there are no alternative IDs withp < 0.1. 
XID fluxes are also required to have p < 0.8 and \ 2 < 5- Good 
agreement can be seen between the XID and SCAT flux densities 
for "clean" sources. This suggests that any discrcpencies between 
SCAT and XID are solely due to issues with source blending. 
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Figure 9. Differential number density of sources in the XID cat- 
alogues and SCAT v3 release catalogues. The black line is the 
current best estimate of true source density from Oliver et al. 
(2010b). The solid lines are XID, open symbols are SCAT in all 
panels. 



Additionally, a MCMC approach offers the natural inclu- 
sion of 'non-linear' prior knowledge on the solutions, such 
as smooth SED and background constraints. Preliminary 
testing of a hybrid MCMC method, which makes use of 
Hamiltonian dynamics to draw samples, on the simulated 
data presented in Section [6] has shown that for typical seg- 
ment sizes containing < 100 sources, MCMC chains of length 
~ 10 6 can robustly recover the true variance in the flux den- 
sity estimates, although with some loss of precision in the 
flux density estimate. Further testing with this approach is 
needed to determine if MCMC based methods can return 
the best results in terms of both precision and robust error 
estimation. 
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Figure 10. SPIRE vs. 24 /im flux density for sources in GOODS- 
N. The vertical lines indicate the depth of 24 /an imaging in each 
SDP field, while the horizontal lines indicate the 50 per cent 
completeness level of our SPIRE catalogues from the analysis 
presented in Section [7] Fields are; GOODS-N (solid), Lockman- 
North (dashed), FLS (dot-dashed), Lockman-SWIRE (dotted). 



10 CONCLUSIONS 

We have presented a new technique for producing associa- 
tions between astronomical observations at different wave- 
lengths. This method is optimised for use on Herschel SPIRE 
imaging in the presence of deep 24 fim catalogues from 
Spttzer. This technique has been used to produce XID cata- 
logues for the HerMES SDP fields. Thorough testing is per- 
formed on simulated and real data sets for both our new 
method, and two existing XID methods. Compared to a 
more traditional approach of source detection and catalogue- 
based cross identification, our map-based approach is found 
to give significantly greater accuracy in the flux density 
and recovers a much larger fraction of faint SPIRE sources. 
When compared to the Sussextractor derived source cata- 
logues of Smith et al. (2010) we find good agreement be- 
tween flux density estimates, for those sources considered 
to be 'unconfused'. We find that the use of the 24 fim prior 
input list can introduce an additional incompletness which 
is strongly dependant on the relative depth of the existing 
24 fim data to our SPIRE data. From the combination of 
deep SPIRE and Spitzer 24 /im observations in GOODS-N 
we estimate an incompleteness due to the 24 /jm limit in the 
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Figure 11. Differential number density of SPIRE sources in the 
GOODS-N field as a function of 24 fim flux density. The curves 
have been corrected for incompleteness in the 24 /im catalogue, 
but not the SPIRE incompleteness. The three panels show the re- 
sults for the 250 /an (top), 350 fim (middle) and 500 fim (bottom) 
band. The black line in each panel shows our best estimate of the 
differential number density of sources detected in the GOODS-N 
field as a function of 24 fim flux density. The other lines show the 
effect of imposing the SPIRE 50 per cent completeness limit on 
the GOODS-N catalogue. The dashed line indicates the number 
density which would be quoted if one source was observed in that 
bin. 



other SDP fields of ~ 20 per cent at 250 /jm, increasing 
to ~ 40 per cent at 500 fim. However this incompleteness 
is dominated by the faintest SPIRE sources (i.e. less than 
30-40 mjy), and we can be confident our catalogues are 
complete at bright fluxes. 
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